SNPs in genes related to the repair of damage to DNA in clinical isolates of M. tuberculosis: A transversal and longitudinal approach

The presence of SNPs in genes related to DNA damage repair in M. tuberculosis can trigger hypermutagenic phenotypes with a higher probability of generating drug resistance. The aim of this research was to compare the presence of SNPs in genes related to DNA damage repair between sensitive and DR isolates, as well as to describe the dynamics in the presence of SNPs in M. tuberculosis isolated from recently diagnosed TB patients of the state of Veracruz, Mexico. The presence of SNPs in the coding regions of 65 genes related to DNA damage repair was analyzed. Eighty-six isolates from 67 patients from central Veracruz state, Mexico, were sequenced. The results showed several SNPs in 14 genes that were only present in drug-resistant genomes. In addition, by following of 15 patients, it was possible to describe three different dynamics of appearance and evolution of non-synonymous SNPs in genes related to DNA damage repair: 1) constant fixed SNPs, 2) population substitution, and 3) gain of fixed SNPs. Further research is required to discern the biological significance of each of these pathways and their utility as markers of DR or for treatment prognosis.


Introduction
With more than 10 million new cases and 1.6 million deaths in 2021, tuberculosis (TB) remains one of the most important infectious diseases worldwide [1].The increase in drugresistant (DR) strains of M. tuberculosis, the greater susceptibility of hosts with diabetes mellitus and HIV/AIDS to contract the active infection, and the decrease in preventive and diagnostic activities by health systems in the face of the recent SARS-Cov2 pandemic, are the elements that currently influencing the severity of this pandemic [2].
SNPs are the main source of variation in M. tuberculosis [3,4], and their occurrence is influenced by drug treatment, host environment [5], and oxidative stress in the bacterium [6].The high polymorphic presence in genes related to the DNA damage repair system (GRDDR) in M. tuberculosis has gained interest in recent years [7,8].GRDDRs play a key role in genomic integrity and diversification, and thus in drug resistance [9,10].Given this, it has been documented that the presence of SNPs in some GRDDRs can trigger hypermutagenic phenotypes that are more likely to generate DR [11][12][13] or conditions that can limit the survival of the bacterium [14].
Although research focused on the DNA damage repair system in M. tuberculosis has identified 65 GRDDRs, as well as the involvement of some of these in DR acquisition pathways [7], the polymorphic diversity of these genes, their usefulness as DR-TB markers, and the dynamics of SNP generation in GRDDRs during anti-TB treatment are still unknown.The objective of this research is therefore to compare the presence of SNPs in GRDDR between sensitive and DR isolates, as well as to describe the dynamics in the presence of SNPs in M. tuberculosis isolated from recently diagnosed TB patients of the state of Veracruz, Mexico.

Collection and processing of isolates
Collection of clinical isolates of M. tuberculosis was performed by personnel from the State Health Services of Veracruz, Mexico, from January 2021 to June 2022.The sample size was based on convenience and depended directly on the number of patients diagnosed with TB by the health jurisdictions No. V and VII of the state of Veracruz, which are located in the municipalities of Xalapa and Orizaba, respectively.
Isolate processing was performed by collecting 5-15 ml of sputum from patients diagnosed with TB before initiating anti-TB treatment, with subsequent sampling every four weeks.Isolates were processed by Petroff's method [15] and seeded on a solid Lo ¨wenstein-Jensen medium for ±8 weeks.Genomic DNA was then extracted and purified following the previously described CTAB method [16].DNA was quantified using a nanodrop (ThermoScientific, USA).Only samples with DNA concentration >20 ng/μl, and a purity of 1.8-2.0 in ratio of absorbance at 260/280, and 2.0-2.2 in ratio of absorbance at 260/230, were considered for sequencing.The WGS libraries were prepared according to the Nextera XT (Illumina, San Diego, CA, USA) protocol, using 1 ng/mL of DNA previously quantified by a Qubit fluorometer (Invitrogen, Waltham, CA, USA).Quality control of the libraries was conducted using TapeStation (Agilent Genomics, Santa Clara, CA, USA), which were normalized and sequenced using NexSeq 500 (Illumina, San Diego, CA, USA) in a 2 × 150 paired-end format.Genome sequences are available in the NCBI repository under bioproject number PRJNA1041872 accessible through the link https://www.ncbi.nlm.nih.gov/sra/?term=PRJNA1041872.

Bioinformatic analysis of the genomes
Low-quality ends (<30) were removed from the sequences using Fastp [17], then Kraken V.2 [18] and SeqTK V1.3 [19] were used to filter reads belonging to the MTBC complex [20] and avoid false variants as a result of DNA contamination.
Variant calling was performed using the MTBseq [21] and Tbprofiler [22] pipelines, an automated and modular process specialized in the mycobacteria of the M. tuberculosis complex, capable of identifying variants associated with resistance and phylogeny.By default, the pipeline uses the genome of M. tuberculosis H37Rv (NC_000962.3) as the basis for reference mapping, variant calling, and annotation.As a quality control, the genomes were also analyzed by PhyReSE [23] (https://bioinf.fz-borstel.de/mchips/phyresse/),identifying non-fixed allelic variants in the M. tuberculosis genomes.
Only sequences with a mean genome coverage >40X were considered.Variants in 20 reads and at �90% frequency were called fixed SNPs.Variants in at least 10 reads with a frequency of �10% to �90% were called no-fixed SNPs and were used to detect antibiotic resistance and allelic variation in GRDDRs.
From the variant calling, variants located in the coding regions of the 65 GRDDRs were identified and selected (S1 Table ).The final database was constituted with the SNPs identified, resistance profile, lineage of the bacterium, patient ID, and date of isolate collection.
The phylogeny was constructed using a concatenated alignment of fixed SNPs from all recovered clinical isolates, performed with IQ-TREE software [24] (http://iqtree.cibiv.univie.ac.at/) using default parameters.The consensus tree generated was visualized with iTOL [25] (https://itol.embl.de/).The fixation index was used to identify SNPs in GRDDRs phylogenetically related to each M. tuberculosis sublineage.The fixation index (Fts = 1) indicates that a SNP is present in all individuals of the same lineage/sublineage but is not present outside of it.The fixation index was calculated with the Genepop package for Rstudio [26].

Statistical analysis
Differences between sensitive and DR isolates were analyzed using Fisher's exact test with the program IBM SPSS V21 [27] (95% confidence level), excluding the SNPs phylogenetically associated with the sublineages.Comparison between consecutive intakes from the same patient was performed descriptively, based on the SNPs phylogenetically related to the infecting sublineage in each case.

Ethical considerations
The project received approval from the Research Ethics Committee of the Institute of Health Sciences at Universidad Veracruzana (No. D_074).Written informed consent was obtained from all patients who provided sputum samples.

Variants in DNA repair genes
A total of 69 non-synonymous SNPs were identified and distributed among 38 GRDDRs in M. tuberculosis.The genes with the highest number of SNPs were LigD (5 sites), DnaE2, and PolA (4 sites).The only SNPs identified in UvrD2 caused an early stop codon, while in End (Nfo) only a single deletion was observed.Although most of the SNPs identified were fixed in the genomes (allele frequency >95%), the MutY, SSBb, RuvB, RecD, and AdnA genes accounted for most of the SNPs with allelic variation, with a frequency of 5-94.9%.
Fixed variants associated with sublineages.The phylogenetic analysis of non-synonymous SNPs enabled the identification of 18 fixed SNPs related to the different sublineages that made up the sample (Table 2).It is worth mentioning that the presence of an L4.1.1 genome within the L4.1.1.3lineage cluster could be due to the lack of coverage in a region determining its classification, however, due to its similarity it is likely to belong to L4.1.1.
Not considering SNPs related to sublineages (Table 2), the comparison between isolates showed significant differences in the presence of 12 SNPs in DR genomes and three SNPs present in sensitive isolates (Table 3).Among these, the presence of non-synonymous SNPs in the genes DinB2 or DnaE2 observed in 71.4% of the DR genomes was notable.On the other hand, the presence of non-synonymous SNPs in the Nei1 gene was observed in 31% of the sensitive isolates.

Changes in the presence of SNPs in GRDDR during anti-TB treatment
The longitudinal analysis allowed us to follow the changes in GRDDRs in M. tuberculosis before and after the standardized anti-TB treatment (R/Z/E/P).Through a follow-up of 15 patients, it was possible to describe three different dynamics of appearance and evolution of non-synonymous SNPs in GRDDR (Fig 2).
1. Constant fixed SNPs: these are related to the sublineages and remain constant between the different isolates of a patient (Fig 2A -2G).Although allelic diversification was observed in some loci, it was sporadic and they therefore did not become fixed as dominant SNPs in the infecting population.This dynamic was mainly observed in sensitive genomes with the sublineages L4.1, 4.3.3,and 4.3.4.2.The constancy of fixed SNPs between consecutive isolates was also observed in three patients with DR-TB caused by L4.1.1.3.

Discussion
The present study represents the first approach focused on the allelic diversity of GRDDRs in M. tuberculosis circulating in affected population.The analysis allowed us to identify some SNPs present only in DR isolates, as well as to observe allelic variation in genes recently related to hypermutagenic TB phenotypes.In addition, analysis of consecutive samples allowed us to describe, for the first time, how the presence of SNPs in GRDDRs can change between preand post-treatment isolates, following three different dynamics.It is worth mentioning that the small number of samples limited the analysis to a descriptive scope.SNPs observed only in DR isolates were located in the low-fidelity polymerases DinB2 [29] and DnaE2 [30], as well as the polymerase Prim-polC [31], which reportedly participate in induced mutagenesis in M. tuberculosis; the ligases LigA and LigB [7].SNPs were also observed in the Uvr-BC genes involved in DNA damage recognition and initiation of repair by nucleotide excision [7], in which the presence of mutations has been associated with failure of the repair pathway and consequently increased mutations and drug resistance [12].In the same sense, they were identified in the Adn-AB dimer, a participant in the initiation of DNA cleavage repair by RecR-dependent homologous recombination [7].The uracil-specific glycosylase Ung [32], is a gene in which the presence of SNPs is linked to the acquisition of DR in the Haarlem lineage [33].Also were observerd the ribonucleotide glycosylase RNaseH1 [7]; the non-functional MutM/Fpg homolog, Fpg2 [7]; as well as Dut, an enzyme involved in the nucleotide pool sanitization of dUTP and dCTP [7].Although the presence of differential SNPs in DR isolates may be due to selective treatment pressure, some compensatory mechanism, or related to an increased adaptability of the bacterium, their implications for disease control, and their utility as surrogate markers of DR-TB are unknown.Although none of the SNPs with low allele frequency showed significant associations with DR, the allelic variation in MutY (4030721, Ala77Pro) and RuvB (2923293, Arg314Pro) present in different sublineages could suggest a homoplasy event.Although RuvB is a protein involved in the resolution of branched DNA structures [34], our previous observations suggest that the occurrence of SNPs in RuvB is more frequent in the presence of DMT2 in the host [35].On the other hand, recent research has shown that some mutations in MutY abrogate its functions generating a hypermutagenic phenotype, with increased survival and rapid acquisition of resistance mutations for rifampicin and ciprofloxacin [12,36].On the other hand, the presence of SNPs in Mfd and RecB observed in sensitive genomes of L4.1.1 with >9 weeks of anti-TB treatment could reflect positive selection for anti-TB drugs, while the absence of SNPs related to drug resistance in these isolates could indicate that the SNPs in Mfd and RecB are related to persistence or tolerance pathways of M. tuberculosis.
Analysis of consecutive isolates showed that the presence of SNPs in GRDDR can fluctuate during the anti-TB treatment.Consequently, it was possible to elucidate three distinct dynamics: 1) constant fixed SNPs, 2) population substitution, and 3) gain of fixed SNPs.Although SNP gain and population substitution have been previously documented as adaptive pathways of M. tuberculosis [37][38][39], our findings represent the first report of this perspective of analysis using only SNPs in GRDDR, highlighting the importance of these genes, and their diversification, in the early months of anti-TB treatment.
The constant presence of fixed SNPs among serial isolates could imply a functional GRDDR SNP configuration, both for infection and for coping with the anti-TB treatment.Although allelic variation in some other genes was observed in this configuration, it was transient and was not observed to be fixed in the dominant population.This is consistent with other studies documenting long-term infections, with no changes in the presence of fixed SNPs in populations of M. tuberculosis [39,40].
On the other hand, population substitution could indicate that the configuration of SNPs in GRDDR observed in the initial strain was unable to withstand the damage generated by drug therapy, such that the initial population is no longer observed as predominant and is replaced by bacteria with a different configuration of SNPs [38,41].We can also theorize that the event of population substitution will depend on the diversity of bacterial subpopulations present in the host.In this regard, observations in some M. tuberculosis infections with the L4.1.1 sublineage suggest that genomes with non-synonymous SNPs in RecB and Mfd were better able to tolerate drugs, even if they were not the dominant populations identified at the onset of symptomatology and diagnosis by health services.
The gain of fixed SNPs in GRDDR was the least observed dynamic in the sample and its presence opens new questions about its origin.Although the difference in SNPs observed between consecutive isolates could indicate the selective pressure of pre-existing bacterial subpopulations in the face of drug treatment, it is also possible that these SNPs were generated de novo in the dominant population.The fact that the only case where a gain of fixed SNPs was observed was a sensitive TB (L4.1.1)could indicate that the diversification of GRDDRs is one of the prior routes for DR acquisition, the origin of which could be related to the time of drug exposure and host-specific factors.
The results suggest the need to continue investigating the origin of SNPs in GRDDR and their dynamics during antituberculosis treatment, particularly to rule out that the presence of SNPs in this group of genes is associated with stochastic processes, neutral evolution, or gene carry-over events.
With regard to the diversity of M. tuberculosis observed in the state of Veracruz, the prevalence of Euro-American (L4) lineages of TB in Mexico has been documented since the last decade [42].However, the incidence of lineage X (L4.1.1 and L4.1.1.3)increased in the region, from ~13% (2012-2021) [43,44], to the 42% reported in this study.This is of concern given the high frequency of drug resistance in lineage L4.1.1.3previously identified in this [45,46] and other regions [47].

Conclusion
Analysis of M. tuberculosis isolates showed SNP variations in GRDDR between drug-resistant and drug-sensitive isolates.Furthermore, a comparison of pre-and post-treatment isolates indicates that the diversification of SNPs in GRDDR could have three distinct dynamics: 1) constant fixed SNPs, 2) population substitution, and 3) gain of fixed SNPs.However, further research is required to discern the biological significance of each of these pathways and their utility as markers of DR or for treatment prognosis. https://doi.org/10.1371/journal.pone.0295464.t002

Fig 2 .
Fig 2. Dynamics of SNPs in genes related to DNA damage repair during treatment.W: weeks.Arrow: changes over time.Circles: percentage of allele frequency in the bacterial population.https://doi.org/10.1371/journal.pone.0295464.g002

Table 3 . Genes and SNPs related to sensitive and drug-resistant genomes.
DR: drug resistance.AA: Amino acid change.Calculated based on the number of patients.SNPs related to sublineages are excluded.Only significant results are shown through Fisher's exact test (95%CI).* SNPs with allelic variation.** Considering all non-synonymous SNPs identified in the sample.https://doi.org/10.1371/journal.pone.0295464.t003